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Abstract 

It  is  becoming  increasingly  common  for  network  de¬ 
vices  to  handle  packets  based  on  the  contents  of 
packet  payloads.  Example  applications  include  in¬ 
trusion  detection,  firewalls,  web  proxies,  and  layer 
seven  switches.  This  paper  analyzes  the  problem 
of  intrusion  detection  and  its  reliance  on  fast  string 
matching  in  packets.  We  show  that  the  problem 
can  be  restructured  to  allow  the  use  of  more  ef¬ 
ficient  string  matching  algorithms  that  operate  on 
sets  of  patterns  in  parallel.  We  then  introduce  and 
analyze  a  new  string  matching  algorithm  that  has 
average-case  performance  that  is  better  than  Aho- 
Corasick,  a  popular  linear-time  algorithm  and  much 
better  than  the  iterative  use  of  Boyer-Moore  cur¬ 
rently  used  in  the  popular  intrusion  detection  platform 
Snort.  We  then  measure  the  actual  performance 
of  several  search  algorithms  on  actual  packet  traces 
and  rulesets.  Our  results  provide  lessons  on  the 
structuring  of  content-based  handlers,  string  match¬ 
ing  algorithms  in  general,  and  the  importance  of 
performance  to  security. 

1  Introduction 

While  the  ideal  approach  to  securing  a  network  is 
to  remove  all  security  holes  from  individual  hosts, 
a  less  expensive  and  more  feasible  retrofit  approach  is 
to  add  a  security  device  at  the  edge  of  the  net¬ 
work  that  can  do  monitoring  and  filtering.  The 
simplest  such  device  is  a  firewall  [21]  that,  among 
other  things,  can  specify  rules  that  specify  which 
combinations  of  packet  headers  are  allowed  through. 
However,  in  order  to  allow  connectivity  to  the  ex¬ 
ternal  world,  the  firewall  must  still  allow  large  classes 
of  packet  headers  because  they  can  come  from  le¬ 
gitimate  users.  Even  with  firewalls  it  is  possible 
for  unauthorized  users  to  mount  attacks  or  even 
for  legitimate  users  to  abuse  their  privileges.  Ex¬ 
ample  attacks  include  unauthorized  modification  of 
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files,  or  unauthorized  use  of  system  resources  such 
as  accounts. 

Network  Intrusion  Detection:  To  go  beyond 
firewalls  while  staying  within  the  retrofit  philosophy, 
a  relatively  new  area  of  security  called  network  intrusion 
detection  has  been  developed.  [23]  and  [9]  contain 
good  summaries  of  the  area.  Intrusion  detection 
is  based  on  the  premise  that  an  intruder’s  behav¬ 
ior  will  be  significantly  different  from  that  of  a  legitimate 
user;  network  based  intrusion  detection  attempts  to 
infer  such  anomalous  intruder  behavior  by  detect¬ 
ing  and  analyzing  patterns  of  packets  sent  by  an 
intruder  to  the  network.  For  example,  all  pack¬ 
ets  sent  from  the  external  world  can  be  passively 
monitored  by  the  detector.  If  certain  anomalous 
patterns  are  detected,  the  detector  could  be  com¬ 
bined  with  a  firewall  function  that  denies  access  to 
the  intruder.  Notice  that  unlike  the  pure  firewall 
approach,  detection  systems  do  not  seek  to  prevent 
all  attacks,  but  only  to  detect  them  quickly  and 
notify  a  security  officer. 

While  analyzing  network  traffic  patterns  in  real¬ 
time  is  difficult  in  its  full  generality  (indeed  many 
systems  use  statistical  [10]  or  rule-based  AI  approaches 
to  this  problem),  today’s  commonly  used  systems 
break  up  the  problem  into  two  parts:  a  signature 
detection  component  and  an  analyzer.  The  signa¬ 
ture  detection  component  only  seeks  to  prune  the 
space  of  all  network  traffic  into  a  set  of  possibly 
suspect  traffic  which  is  then  sent  to  the  analyzer 
(either  a  security  officer  or  a  program).  Ideally, 
the  signature  detection  component  should  operate  in 
real-time  and  prune  a  substantial  fraction  of  the  traffic. 

While  our  paper  concentrates  on  signature  detec¬ 
tion,  we  hasten  to  add  that  analyzer  algorithms  are  also 
very  important  and  well  worth  solving.  Events  from  the 
signature  detection  device  are  sent,  in  real-time  or 
batches,  to  the  analyzer  (also  known  as  a  moni¬ 
tor  [3],  director  [29],  manager  [34],  or  controller 
[17])  which  may  perform  data  fusion  [15]  to  corre¬ 
late  events  and  use  advanced  logic  and  algorithms 
to  determine  when  an  alert  should  be  sent.  These 
alerts  may  be  sent  to  people,  to  other  analyzers 
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as  events  [25],  or  to  an  automated  response  sys¬ 
tem  that  may  take  actions  such  as  modifying  a  firewall. 

Signature  Detection  versus  Packet  Filtering: 
Signature-based  intrusion  detection  systems  such  as 
the  popular  Snort  program  [28]  and  the  Network 
Flight  Recorder  [24]  are  typically  configured  with  a  set  of 
rules  to  detect  popular  attack  patterns.  These  rules  look 
almost  exactly  like  firewall  rule  sets  [21]  in  that  patterns 
can  be  specified  on  packet  header  fields  (e.g.,  all 
packets  from  subnet  X  and  sent  to  port  80)  us¬ 
ing  the  usual  flexibility  of  specifying  prefixes,  wild- 
carded  fields,  and  port  ranges.  However,  signa¬ 
ture  detection  systems  go  one  step  beyond  packet 
filters  in  complexity  by  also  allowing  an  arbitrary 
string  that  can  appear  anywhere  in  the  packet  pay- 
load.  This  is  harder  in  some  ways  than  firewall 
rules  because  firewall  rules  specify  packet  headers  that 
are  in  well-defined  portions  of  the  packet.  Thus 
many  firewall  rules  only  have  to  examine  roughly 
128  bits  within  the  first  40  bytes  of  a  packet  header.  By 
contrast,  a  signature  system  has  to  search  for  a 
string  that  can  be  anywhere  within  the  (say)  4500 
bytes  of  packet  payload  for  (say)  an  FDDI  packet. 

We  note  that  the  main  topic  of  this  paper,  string 
matching  in  packet  content,  is  also  of  interest  to 
many  applications  that  make  use  of  content-based 
forwarding.  Some  other  examples  include  firewalls 
scanning  for  viruses,  web  proxy  filters,  layer  seven 
switches  doing  web  load  balancing  based  on  URLs 
and  even  cookies  [5] ,  and  content  distribution  in¬ 
frastructure.  Thus  we  believe  that  string  matching 
over  packet  content  data  is  an  important  problem 
which,  despite  possible  disapproval  from  advocates 
of  network  transparency,  is  here  to  stay. 

Set  String  Searching:  Since  a  signature  based 
system  has  several  firewall  style  rules,  each  of  which  may 
have  an  associated  string  pattern,  signature  systems 
marry  all  the  complexity  of  standard  packet  clas¬ 
sification  together  with  string  matching.  Today’s 
systems  such  as  Snort  examine  a  rule  at  a  time: 
each  time  a  rule  matches.  Snort  does  a  fast  string 
search  on  the  associated  pattern  using  the  Boyer- 
Moore  algorithm.  While  the  Boyer-Moore  algorithm 
is  very  fast  for  a  single  string  search,  the  prob¬ 
lem  is  that  a  given  packet  can  match  several  rules 
with  patterns  (up  to  300  rules  in  the  sample  Snort 
database),  forcing  up  to  300  Boyer-Moore  searches. 
Thus  this  technique  does  not  scale  with  increasing 
rule  sizes;  it  even  motivates  an  algorithmic  attack 
where  an  intruder  could  send  worst-case  packets  that 
slow  down  the  search  and  cause  dropped  packets, 
upon  which  the  intruder  could  begin  her  attack. 

In  recent  years  there  have  been  several  papers  in  the 
literature  (e.g.,  [21,  13])  that  show  that  there  are 


more  scalable  approaches  to  firewall  rule  searching 
than  iterating  through  the  rule  set.  The  main  idea  is  to 
combine  all  the  rules  into  a  single  data  structure 
that  can  be  searched  once  to  find  the  first  matching 
rule.  Thus  a  natural  question,  which  we  pose  in 
this  paper,  is  why  the  same  approach  cannot  be 
taken  to  signature  systems.  Is  there,  for  example, 
a  way  to  combine  all  string  searches  into  a  single 
string  search?  If  so,  how  much  faster  is  such  an 
integrated  string  search  over  separate  string  searches  for 
currently  used  and  synthetic  rule  sets?  Would  de¬ 
ploying  such  a  new  algorithm  in  say  Snort  reduce 
the  likelihood  of  an  algorithmic  attack  succeeding? 
Our  paper  attempts  to  answer  all  these  questions. 

Paper  Contributions  and  Organization:  Thus 
the  contributions  of  our  paper  are  as  follows.  First, 
we  introduce  an  algorithmic  aspect  of  Intrusion  De¬ 
tection  as  an  important  problem  worth  solving  in 
the  same  manner  as  IP  lookups  and  packet  classi¬ 
fication.  Our  second  contribution  is  an  examina¬ 
tion  of  a  multiple-pattern  search  algorithm  that  com¬ 
bines  the  one-pass  approach  of  Aho-Corasick  with 
the  skipping  feature  of  Boyer-Moore  as  optimized 
for  the  average  case  by  Horspool.  This  sublinear 
multiple-pattern  algorithm  is  in  the  same  class  as 
those  by  Commentz- Walter  [7],  Wu  &  Manber  [35], 
Gusfield  [14],  and  Kim  &  Kim  [19],  but  to  the 
best  of  our  knowledge,  after  consulting  the  litera¬ 
ture  and  algorithm  experts,  our  setwise  algorithm 
is  a  new  algorithm  and  may  be  of  independent  interest. 

Third,  we  have  implemented  commonly  used  ver¬ 
sions  of  BM,  AC,  and  our  new  algorithm  in  a  li¬ 
brary  that  can  easily  be  instantiated  to  choose  at 
will  different  string  algorithms  for  use  by  a  signa¬ 
ture  detection  utility.  Fourth,  we  have  measured 
the  effects  of  replacing  the  string  matching  compo¬ 
nent  in  Snort  with  the  various  algorithms  in  the 
library;  the  real  comparisons  take  into  account  the  rela¬ 
tive  importance  of  string  searching  both  over  real 
traces  and  benchmarks,  and  include  artifacts  due 
to  machine-dependent  effects  such  as  cache  sizes.  Fifth, 
we  introduce  techniques  of  caching  search  results  to 
preserve  the  ordering  of  matching  rules  and  vary¬ 
ing  case  sensitivity.  Finally,  to  improve  the  per¬ 
formance  of  set-wise  string  searching  algorithms,  we  also 
explore  different  techniques  for  partitioning  rulesets. 

The  rest  of  the  paper  is  structured  as  follows. 
In  Section  2  we  examine  the  popular  intrusion  de¬ 
tection  platform,  the  structure  of  a  sample  signa¬ 
ture,  and  how  Snort  could  benefit  from  the  use 
of  better  string  matching  algorithms.  In  Section 
3,  we  explain  the  operation  of  the  popular  Boyer- 
Moore  and  Aho-Corasick  algorithms.  In  Section  4, 
we  present  our  set-wise  Boyer-Moore-Horspool  algo- 
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alert  tcp  ! $HOME_NET  any  ->  $HOME_NET  80 
(msg: "IDS219  -  WEB-CGI-Perl  access  attempt"; 

flags: PA;  content : "perl . exe" ;  nocase;) 

Figure  1:  Example  Snort  Rule 

rithm  for  searching  sets  of  strings  simultaneously  and 
analyze  its  theoretical  performance.  In  Section 
5  we  conduct  numerous  performance  tests  of  all  of 
these  algorithms  and  describe  several  key  performance 
characteristics.  By  carefully  structuring  the  prob¬ 
lem  we  are  able  to  improve  the  performance  of  Snort  by  a 
factor  of  4.5.  Finally,  we  conclude  the  paper  in  Section  6. 

2  Snort  and  String  Matching 

In  this  section,  we  motivate  and  ground  the  solu¬ 
tions  found  in  the  rest  of  the  paper  by  examining  a  popu¬ 
lar  sample  signature  system  (Snort).  After  briefly 
introducing  Snort  in  Section  2.1,  we  describe  pro¬ 
filing  information  that  shows  that  string  matching 
is  a  bottleneck.  We  then  describe  why  it  is  im¬ 
portant  to  improve  the  performance  of  Snort  as  a 
whole.  Finally,  in  Section  2.4,  we  show  two  ways 
to  use  set  string  matching  within  Snort  to  poten¬ 
tially  improve  performance. 

2.1  A  Brief  Introduction  to  Snort 

Snort  [28]  is  a  freely  available  (under  the  GNU  li¬ 
cense),  lightweight  network  intrusion  detection  sys¬ 
tem  that  is  configured  with  a  list  of  rules  that 
each  define  a  signature  and  a  corresponding  event 
log  description.  Snort  also  has  a  plug-in  architec¬ 
ture  that  allows  stateful  analysis  to  be  performed 
and  is  currently  used  by  preprocessors  that  avoid 
some  subterfuge  attacks  (as  defined  in  [26]).  Be¬ 
cause  of  its  free  availability  and  efficiency.  Snort 
is  quite  commonly  used  and  there  are  very  large 
and  current  databases  of  signatures  maintained  on 
the  Internet  [32,  30]. 

In  Figure  1  we  present  an  example  Snort  rule.  Each 
rule  contains  an  action  {alert  in  this  case),  a  protocol 
{tcp),  a  source  netmask,  typically  defined  as  anything 
but  the  network  being  monitored,  a  source  port  {any), 
and  a  destination  netmask  and  port  {80).  Following 
these  standard  fields  can  be  a  list  of  arbitrarily  many 
other  options.  The  msg  string  is  the  alert  to  send  if 
this  rule  is  matched.  The  optional  flags  field  specifies 
a  set  of  TCP  flags  that  must  be  set  for  a  packet  to 
match.  The  content  field  specifies  a  string  to  match 
in  the  packet  and  the  nocase  flag  specifies  that  the  search 
should  be  case  insensitive. 

The  alert  message  in  our  example  refers  to  IDS219 


which  is  a  unique  key  assigned  to  this  vulnerabil¬ 
ity.  Such  keys  aid  in  the  correlation  of  alerts  be¬ 
tween  different  products,  and  lead  response  person¬ 
nel  to  additional  information.  For  example,  the 
arachNIDS  web  database  [32]  of  these  vulnerabili¬ 
ties  contains  the  following  description: 

An  attempt  was  made  to  execute  perl.exe.  If 
the  Perl  interpreter  is  available  to  web  clients, 
it  can  be  used  to  execute  arbitrary  commands 
on  the  web  server.  This  can  be  used  to  break 
into  the  server,  obtain  sensitive  information, 
and  potentially  to  compromise  the  availability 
of  the  web  server  and  the  machine  it  runs  on. 

Many  web  server  administrators  inadvertently 
place  copies  of  the  Perl  interpreter  into 
their  web  server  script  directories.  If  perl 
is  executable  from  the  cgi  directory,  then 
an  attacker  can  execute  arbitrary  commands 
on  the  web  server.  CERT  Advisory  CA- 
96.11  .interpreters  _in_cgi  _bin  _dir  ^ 

Over  30  vendors  and  security  organizations  are 
currently  participating  to  create  a  Common  Vul¬ 
nerabilities  and  Exposures  ‘dictionary’  of  common 
vulnerabilities  [22].  This  example  rule  is  currently 
a  candidate  for  inclusion  in  the  CVE,  but  has  yet 
to  be  vetted  by  the  CVE  editorial  board  of  secu¬ 
rity  experts. 

Most  Snort  signatures  contain  a  payload  byte  string 
(84%  of  the  January  8,  2001  Snort.org  full  ruleset).  In 
this  paper,  we  examine  algorithms  for  improving  the 
performance  of  string  matching  for  these  rulesets.  Some 
other  intrusion  detection  systems  such  as  Bro  [26]  and 
NFR  [24]  have  specialized  packet  handling  languages  in 
which  inspection  programs  can  be  written.  It  is  unlikely 
that  the  same  degree  of  optimization  can  be  applied  to 
those  systems.  However,  most  commercial  systems  such 
as  those  from  Cisco  use  string  matching  rules  like  those 
in  Snort. 

2.2  Snort  Performance  Profile 

In  [28],  it  is  asserted  that  string  matching  is  the 
most  computationally  expensive  test  that  Snort  com¬ 
monly  performs  on  packets.  Profiling  of  a  typical 
Snort  configuration  supports  this  conclusion.  Ac¬ 
tual  performance  is  clearly  dependent  on  both  the 
ruleset  used  and  the  characteristics  of  the  traffic  be¬ 
ing  monitored.  We  ran  Snort  1.6.3  on  a  dataset 
of  8.7  million  packets  totalling  more  than  1  giga¬ 
byte  captured  over  a  25  minute  period  from  the  In¬ 
ternet  connection  of  a  large  enterprise  with  mixed 

^  ftp://  ftp.cert.org/ pub/cert_advisories/CA- 
96.  ll.interpreters_in_cgi_bin_dir 
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String  match 
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31% 

Packet  classification 

EvalHeader 
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Packet  classification 
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6.7% 

Other  matching 

EvalOpts 

5.8% 

Figure  2:  Profile  of  Snort 

business  and  scientific  users.  Using  the  full  rule- 
set  from  Snort.org  that  accompanied  this  release  of 
Snort  and  the  gprof  [12]  execution  profiler,  we  show  that 
string  searching  is  the  most  expensive  part  of  the 
execution  and  accounts  for  31%  of  the  total  exe¬ 
cution  time.  The  top  four  routines  are  shown  in 
Figure  2.  The  second  most  run  routine  accounts 
for  less  than  10%  of  total  execution  time. 

One  common  optimization  is  to  use  a  kernel  or 
hardware  prefilter  to  exclude  packets  that  are  not 
of  interest  to  the  detection  system.  Such  prefilter¬ 
ing  would  only  increase  the  fraction  of  packets  on 
which  string  searches  would  be  performed. 

2.3  Algorithmic  Performance  Attacks 

Who  would  benefit  if  Snort  could  be  made  faster?  In 
[26],  Paxson  presents  a  taxonomy  of  attacks  including 
the  overload  attack  in  which  an  attacker  overloads  an 
intrusion  detection  system  by  flooding  it  with  innocuous 
packets  until  the  system  starts  dropping  packets.  There 
is  then  a  high  probability  that  the  detection  system  will 
not  catch  an  attack  that  is  interjected  in  this  stream 
of  packets.  To  have  strong  assurance  that  a  detection 
system  is  not  subject  to  such  attacks,  the  system  must 
be  able  to  support  full  utilization  of  the  network  that 
it  is  monitoring. 

An  ideal  target  of  such  attacks  is  what  we  call  an  algo¬ 
rithmic  performance  vulnerability  in  which  the  per¬ 
formance  of  a  system  is  dependent  on  the  inputs 
to  certain  algorithms.  Thus,  an  attacker  can  in¬ 
tentionally  provide  inputs,  in  the  form  of  packets, 
that  will  knowingly  cause  the  worst-case  performance  of 
an  algorithm.  A  detection  system  that  does  not 
support  full  network  utilization  with  worst-case  pack¬ 
ets  is  vulnerable  to  an  algorithmic  performance  at¬ 
tack.  Paxson  suggests  that  the  site-specific  policies 
provide  the  most  opportunity  for  loading  and  that 
these  policies  will  not  be  known  to  an  attacker.^ 
However,  it  cannot  be  safely  assumed  that  detec¬ 
tion  platforms  will  not  make  use  of  common  algo¬ 
rithms  and  their  associated  worst-case  performance 
vulnerabilities. 

The  bottom  line  is  that  improving  Snort  performance 

^The  strength  of  such  security  through  obscurity  is  subject  to  debate. 


increases  the  likelihood  of  an  attack  being  detected. 

2.4  Set  String  Matching  Options 

Given  that  string  matching  is  a  bottleneck  and  per¬ 
formance  is  important,  we  note  that  there  are  at 
least  two  ways  that  an  efficient  set  string  match¬ 
ing  algorithm  could  be  used  to  improve  the  per¬ 
formance  of  Snort.  These  include: 

1 )  Single  Set  of  Strings:  We  can  search  for  the 
strings  of  all  rules  simultaneously.  The  results  can 
be  used  in  two  ways.  First,  we  can  immediately 
exclude  rules  that  have  content  strings  which  do 
not  match.  Alternatively,  we  can  cache  the  re¬ 
sults  and  refer  to  them  whenever  a  string  match 
is  called  for.  A  possible  disadvantage  is  that  a 
given  packet  may  match  only  a  small  subset  of  the  rules, 
while  the  single  set  string  search  (unnecessarily)  in¬ 
cludes  in  its  search  target  strings  that  the  packet 
may  not  match.  Such  unnecessary  inclusion  may, 
for  example,  make  the  data  structure  too  large  to 
fit  in  cache 

2)  Multiple  String  Sets:  It  is  common  in  Snort 
to  have  several  rules  whose  criteria  differ  only  in 
content  string.  Thus,  we  only  group  together  such 
rules  into  subsets  and  do  efficient  string  matching 
within  each  subset  of  rules,  while  still  doing  mul¬ 
tiple  string  searching  across  the  subsets.  The  cur¬ 
rent  Snort  implementation  uses  subsets  of  size  1. 
A  possible  disadvantage  of  this  approach  is  that 
we  still  have  to  iterate  over  subsets  unlike  in  the 
case  of  the  Single  Set  Approach.  A  possible  ad¬ 
vantage  is  that  we  can  specialize  the  search  eas¬ 
ily  to  do  standard  Boyer-Moore  for  a  rule  subset 
if  the  size  of  the  subset  is  below  some  threshold. 

3  String  Search  Algorithms 

Before  we  introduce  our  setwise  Boyer-Moore-Horspool 
string  matching  algorithm,  we  briefly  review  the  string 
matching  problem,  and  the  Boyer-Moore  and  Aho- 
Corasick  approaches.  More  information  can  be  found  in 
Gusfield’s  book  [14]  and  a  comprehensive  review  of 
string  matching  [31].  We  focus  specially  on  un¬ 
derstanding  the  main  ideas  behind  standard  Boyer- 
Moore  because  they  are  essential  to  understanding 
many  of  the  multiple-pattern  algorithms,  including 
our  setwise  Boyer-Moore-Horspool  algorithm.  We 
also  briefly  describe  Aho-Gorasick  because  it  is  good 
worst-case  performance  and  is  an  important  candi¬ 
date  for  improving  set  string  matching  that  is  not 
used  currently  in  Snort. 

Assume  a  text  string  T  of  length  n  and  a  pattern 
string  P  of  length  to,  each  composed  of  an  ordered  set  of 
characters  from  an  common  alphabet  A.  The  general 
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problem  is  to  determine  the  location  of  P  within  T,  or 
that  T  does  not  contain  P. 

Let  the  characters  of  T  and  P  be  numbered  se¬ 
quentially  from  left  to  right  starting  with  one.  If, 
for  a  given  offset  or  shift  s,  every  character  Pi  €  Pi... Pm 
matches  the  corresponding  character  G  Ti...r„, 

then  P  occurs  at  offset  s  in  T.  We  write  this 
equality  as  follows: 

7^  T’  _  p  p 

3.1  Boyer-Moore 

Current  Snort  implementations  use  the  Boyer-Moore 
string  matching  algorithm,  which  is  widely  regarded 
as  the  providing  the  best  average-case  performance 
of  any  known  algorithm.  The  algorithm  is  based 
on  a  few  key  observations  that  allow  single  pat¬ 
terns  to  be  found  in  sublinear  time  in  the  aver¬ 
age  case  [4]. 

The  first  observation  is  that  character  comparisons 
can  be  made  from  right  to  left  starting  at  the  end  of  the 
pattern  instead  of  the  beginning.  Let  e  be  the  endpoint 
of  the  pattern  and  e  =  m  initially.  For  j  =  0...(m— 1),  we 
compare  T(,_j  and  Pm-j-  Shifts  are  logically  performed 
by  incrementing  e. 

Bad  Character  Heuristic:  Second,  consider  when  a 
character  mismatch  is  found  between  T^-j  and  Pm-j,  so 
that  j  is  the  length  of  the  matching  suffix  of  P.  If  the  last 
occurrence  in  P  of  the  character  T(,_j  occurs  q  characters 
from  the  end  of  P,  then  there  is  no  point  in  trying  new 
endpoints  less  than  e+q—j.  This  bad  character  heuristic 
allows  us  to  skip  many  of  the  comparisons  of  the  naive 
algorithm.  The  amount  that  we  can  shift  the  endpoint  is 
easily  computed  based  on  a  function  B{c)  that  computes 
the  distance  from  the  end  of  P  to  the  last  occurrence 
of  character  c: 

B{c)  =  min{q  \  Pm-q  =  c} 

For  each  character  c  that  does  not  occur  in  the  pattern 
at  all,  B{c)  =  m.  All  values  for  B  can  be  pre-computed 
and  stored  in  an  array  of  length  |A|. 

If  we  find  a  character  mismatch  at  T^-j,  then  we  can 
safely  set  the  endpoint  e  to  e  -I-  B{Tf.-j)  —  j  and  restart 
the  right  to  left  comparison  of  Te-m+i---Te  and  Pi. ..Pm. 

Note  that  the  last  occurrence  may  not  be  to  the  left  of 
the  current  location  e  —  j,  and  so  B(Te-j)  —  j  <  0.  In 
this  case  the  bad  character  heuristic  provides  no  insight 
and  the  endpoint  must  be  shifted  by  just  one  additional 
position. 

Good  Suffixes  Heuristic:  The  good  suffix  heuris¬ 
tic  is  the  third  key  observation.  If  a  mismatch  is 
found  in  the  middle  of  the  pattern  at  Pm-j,  then 
there  is  a  suffix  of  0  or  more  characters  Pm-j+i.-.Pm 


that  do  match.  Thus,  there  is  no  point  in  test¬ 
ing  new  endpoints  e'  that  do  not  cause  that  same 
suffix  string  to  match  {Tf.'-j+i...Te>  Pm-j+i---Pm)- 

A  stronger  version  of  the  good  suffix  heuristic  is 
attributed  to  Kuipers.  This  version  also  skips  over  re¬ 
occurrences  of  the  suffix  that  are  preceded  by  the 
same  character  as  Pm-j  ■  There  is  no  point  in  attempting 
such  shifts  since  it  is  known  that  Te-j  Pm-j- 
This  optimization  does  not  significantly  complicate 
the  preprocessing  step,  and  allows  the  algorithm  to 
be  provably  linear  even  in  the  worst  case  [14]. 

The  full  Boyer-Moore  algorithm  shifts  by  the  greater 
of  the  values  given  by  the  bad-character  and  good-suffix 
heuristics. 

Examples  of  Boyer-Moore  Operation:  Let  us 

first  consider  the  following  example: 

i 

acdecdacdacda 

acdacda 

Since  the  character  e  does  not  occur  in  the  pat¬ 
tern,  the  bad  character  heuristic  tells  us  than  we 
can  immediately  shift  the  endpoint  by  4  so  that 
the  pattern  starts  after  the  current  character.  The 
good  suffix  heuristic  determines  that  the  suffix  cda 
occurs  3  characters  to  the  left  of  the  current  location, 
acdecdacdacda 

acdacda  Good  suffix  recommendation 

acdacda  Bad  character  recommendation 

We  choose  the  larger  of  the  two  endpoint  shifts,  4 
characters. 

We  start  comparing  the  end  of  the  pattern  at  the  new 
endpoint  and  immediately  find  a  mismatch.  The  bad 
characteristic  heuristic  tells  us  to  shift  by  two.  The  good 
suffix  heuristic  has  nothing  to  add  since  there  is  no  good 
suffix. 

i 

acdecdacdacda 

acdacda  Bad  character  recommendation 

So  we  shift  by  another  two  and  eventually  de¬ 
termine  that  the  string  is  correct.  Note  that  over 
the  entire  comparison,  we  never  examined  the  first 
three  characters,  but  examined  the  seventh  charac¬ 
ter  (the  first  mis-match  point)  twice. 

Performance  Analysis  of  BM:  Clearly  the  perfor¬ 
mance  of  the  BM  algorithm  depends  on  the  characters  in 
the  text  and  pattern  strings.  Knuth  showed  that 
the  original  Boyer-Moore  algorithm  performs  0{n  -\- 
rm)  comparisons,  where  r  is  the  number  of  times 
the  pattern  occurs  in  the  text.  For  r  =  0,  6n 
comparisons  are  required  [20,  31]. 

Cole  showed  that  Boyer-Moore  with  the  stronger 
good  suffix  heuristic  requires  only  4n  comparisons 


5 


if  the  pattern  does  not  occur  in  the  text.  When 
the  pattern  occurs  r  times  in  the  text,  the  number 
of  comparisons  is  0(rn)  [14].  This  complex  anal¬ 
ysis  does  not  even  include  any  advantages  caused 
by  the  bad  character  heuristic. 

Galil  developed  a  slight  modification  to  Boyer-Moore 
that  allows  a  proof  of  0(n)  performance  regardless  of  the 
number  of  occurrences  [11,  14].  In  1986,  Apos- 
tolico  and  Giancarlo  developed  a  variant  of  Boyer- 
Moore  algorithm  that  requires  at  most  2n  compar¬ 
isons,  but  requires  0(m)  additional  space  [2,  31,  14]. 
In  1996,  Grochemore  and  Lecroq  showed  that  the 
Apostolico-Giancarlo  algorithm  has  a  tight  bound  of  at 
most  1.5n  comparisons  [8].  This  bound  is  better 
than  the  2n  of  the  KMP  algorithm  and  encroaches 
on  Ri vest’s  lower  bound  of  n  —  m  +  1  comparisons 
for  any  algorithm’s  worst  case  performance  [27]. 

Many  performance  studies  of  Boyer-Moore  and  its 
variants  have  been  performed.  Despite  worst-case 
performance  that  is  super-linear,  the  observed  av¬ 
erage  case  performance  is  typically  much  better.  Hor- 
spool  studied  a  version  of  Boyer-Moore  that  does 
not  perform  the  good  suffix  comparisons.  This  version  is 
0(nm)  in  the  worst  case,  but  performs  compara¬ 
bly  to  the  original  Boyer-Moore  algorithm  in  the 
average  case  [31]. 

3.2  Aho-Corasick 

The  Aho-Gorasick  algorithm  [1]  is  the  classic  algo¬ 
rithm  for  searching  for  multiple  patterns  simultane¬ 
ously.  Roughly  speaking,  the  Aho-Gorasick  algo¬ 
rithm  uses  the  structure  of  a  finite  automaton  that 
accepts  all  strings  in  the  set.  The  automaton  is 
structured  so  that  every  prefix  is  represented  by  only  one 
state,  even  if  the  prefix  begins  multiple  patterns. 
When  the  next  character  in  the  text  is  not  one 
of  the  expected  next  characters  in  the  pattern,  a 
failure  link  (an  e-transition)  is  taken  to  the  state 
representing  the  longest  prefix  of  a  pattern  that  is 
also  the  proper  suffix  of  the  current  state.  The 
Aho-Gorasick  algorithm  is  0(n)  and  the  precompu¬ 
tation  is  linear  in  the  size  of  the  pattern,  0(m)  [14]. 

4  Set-wise  Boyer-Moore 

In  this  section,  we  describe  a  newly  developed  al¬ 
gorithm  for  matching  sets  of  strings.  We  will  discuss 
previous  work  on  the  category  of  sub-linear  time, 
multiple-pattern  algorithms,  and  describe  the  trade¬ 
offs  that  led  us  to  our  new  algorithm.  This  class 
of  algorithms,  including  our  own,  performs  better 
than  Aho-Gorasick  in  many  cases  by  using  concepts 
from  the  Boyer-Moore  family  of  algorithms. 

Snort  currently  uses  a  single-pattern  Boyer-Moore 


algorithm  repeated  separately  for  each  pattern.  Re¬ 
peating  the  Apostolico-Giancarlo  algorithm  for  each 
of  k  patterns  would  result  in  at  most  1.5nk  com¬ 
parisons.  In  the  best  case,  at  least  ^  compar¬ 
isons  would  be  required  for  a  pattern  of  length  m. 
For  k  patterns  all  of  length  m,  the  total  cost  is 
therefore  between  ^  and  1.5nfc.  When  k  >  2m, 
then  the  Aho-Gorasick  algorithm  is  clearly  superior 
with  its  worst  case  performance  of  2n. 

However,  the  techniques  of  the  Boyer-Moore  family 
of  algorithms  can  be  adapted  to  operate  on  sets 
of  patterns  in  a  single  pass  rather  than  iteratively. 
Gommentz- Walter  [7]  introduced  the  first  such  al¬ 
gorithm  and  Watson  showed  that  the  actual  per¬ 
formance  of  Gommentz- Walter  is  much  better  than 
Aho-Gorasick,  particularly  with  long  patterns  [33]. 
Gusfield  argued  for  a  simpler  algorithm  that  uses 
suffix  trees  to  compute  the  good  suffix  heuristic  [14] .  Wu 
and  Manber  developed  other  techniques  for  imple¬ 
menting  Boyer-Moore  heuristics  for  multiple  patterns 
and  achieved  performance  significantly  better  than 
Aho-Gorasick  or  Gommentz-Walter  for  natural  lan¬ 
guage  texts  [35]. 

Many  of  these  algorithms  have  increasingly  good, 
but  complex  algorithms  for  skipping.  As  evidenced 
by  Horspool’s  variant  of  Boyer-Moore  [16],  simpler 
search  algorithms  can  often  perform  better  than  algo¬ 
rithms  which  skip  more  characters  per  comparison, 
but  require  much  more  work  per  skip.  Horspool 
uses  only  the  bad  character  heuristic  of  Boyer  and 
Moore  and  not  the  good  suffix  heuristic,  without 
loss  of  average-case  performance.  In  this  section,  we 
introduce  a  new  a  multiple-pattern  algorithm  that 
preserves  the  simplicity  of  Boyer-Moore-Horspool. 

The  basic  procedure  of  the  algorithm  is  as  follows. 
The  initial  position  of  the  endpoint  e  is  the  length  of 
the  shortest  pattern.  The  set  of  patterns  can  quickly 
be  compared  to  any  position  in  the  text  by  storing  the 
reversed  patterns  in  a  trie.  Gomparisons  continue  from 
right  to  left  until  a  character  is  found  in  the  text  that 
does  not  match  a  next  character  in  the  trie. 

Recall  that  in  our  description  of  the  Boyer-Moore 
algorithm,  the  bad  character  heuristic  merely  depends 
on  knowing  the  distance  from  the  end  of  the  pat¬ 
tern  of  the  last  occurrence  of  each  possible  charac¬ 
ter.  This  same  pre-computation  can  be  performed 
by  finding  the  minimum  of  the  shifts  of  each  pat¬ 
tern;  let  B(c)  be  the  bad  character  function  for 
the  set  as  a  whole  and  Bi{c)  be  the  bad  charac¬ 
ter  functions  for  the  k  individual  patterns  such  that: 

B{c)  =  min{Bi{c),  1  <  I  <  k} 

Glearly  this  unified  B(c)  function  is  as  conservative  as 
the  most  conservative  of  its  component  functions.  Thus 
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the  correctness  of  the  algorithm  remains  unchanged. 

Note  that  we  have  carefully  described  the  bad 
character  heuristic  in  this  paper  without  referring 
to  a  shift  of  the  starting  character  of  the  pattern. 
Most  presentations  of  this  material  depend  on  the 
starting  character.  By  making  this  simple,  but  incon¬ 
sequential  translation  in  the  description  of  Boyer- 
Moore,  we  make  it  clear  that  the  same  heuristics 
can  be  applied  to  searching  a  set  of  patterns  in  parallel. 

4.1  Average  Case  Analysis 

We  now  examine  the  average  performance  of  this  set¬ 
wise  Boyer-Moore-Horspool  algorithm  (SBMH)  using  a 
probabilistic  model.  Performance  is  measured  in  terms 
of  the  number  of  characters  that  must  be  examined  per 
character  of  shift.  Let  the  size  of  the  alphabet  be  a. 
We  assume  uniform  distributions  of  characters  in  both 
the  text  and  all  k  patterns: 

Pc  =  Pr[Any  given  character]  =  — 

The  size  of  each  shift  is  determined  by  B{c)  and  the  size  t 

shifts 

comparisons 
max  {l,B{Te-t)  -  t) 
t  +  1 


of  the  matching  suffix: 

Performance  = 


The  probability  that  the  matching  suffix  length  t 
equals  some  amount  x  is  simply  the  probability  of 
X  consecutive  characters  in  the  text  matching  one 
or  more  of  k  corresponding  characters  in  the  pat¬ 
terns  and  one  final  character  not  matching: 

=  t<m' 

\  {kpcY  t  =  m' 

Thus,  the  expected  value  of  t  is  based  on  probabilities 
of  each  value  of  t  from  0  to  the  size  of  the  longest  pattern. 


m'  —  1 

E[t]  =  ^  x(l  -  kpc)(kpcY  +  m'{kpc)""' 

x—0 

It  is  clear  that  t  is  dominated  by  the  likelihood  that 
the  first  character  will  not  match.  Given  truly  random 
input,  the  probability  of  matching  any  given  string  in  an 
alphabet  of  significant  size  is  clearly  low. 

The  probability  that  the  bad  character  function 
B{c)  =  s,s  <  m'  is  equal  to  the  probability  that  each 
the  last  s  characters  differs  from  the  correspond¬ 
ing  k  characters  of  the  patterns  and  that  at  least 
one  of  the  patterns  has  the  same  character  c  at 


Figure  3:  Performance  for  different  set  sizes 


position  e  —  j.  B{c)  =  m'  is  the  only  remain¬ 
ing  case  and  occurs  when  all  m'  characters  mismatch: 


Pr[B{c)  =  s] 


kpcY  ~  kpcY  s  <  m' 
(1  —  kpc)‘  s  =  m' 


Therefore,  the  expected  value  of  the  bad  character 
function  is  as  follows: 


m'  —  1 

E[B{c)]  ^  ^  skpc{l  -  kpcY  +  m'{l  -  kpc)"^ 

s=0 


We  now  have  enough  information  to  compute  the  total 
shift  size  as  follows: 

ff  —  f  ^  k3{c)  <  t 

Figure  3  shows  the  expected  performance  computed 
for  various  lengths  of  patterns  (m')  over  multiple  set 
sizes.  The  alphabet  is  256  possible  values  for  a  network 
octet.  For  sets  of  size  1,  performance  improves  quickly  as 
the  pattern  gets  longer  since  each  character  mismatch  is 
likely  to  produce  B(c)  =  m'  which  is  growing.  However, 
as  the  number  of  patterns  in  the  set  increases  as  well,  it 
becomes  more  likely  that  a  bad  character  will  occur  late 
in  one  of  the  patterns  and  the  shift  will  be  small.  Thus 
there  is  a  sweet-spot  for  our  algorithm  that  is  dependent 
on  the  size  of  the  set  as  well  as  the  characteristics  of 
the  patterns  in  the  set. 

For  50  sets  of  100  characters  each,  roughly  1.6 
comparisons  are  made  for  each  character  shift  of 
progress.  This  value  is  unsurprising  given  the  tight 
bound  of  1.5n  found  for  other  variants  of  Boyer-Moore. 

In  Figure  4,  the  progress  has  been  divided  by  the 
number  of  iterations  necessary  to  compare  a  total 
of  50  patterns.  This  normalization  lets  us  com¬ 
pare  the  performance  of  using  batches  of  z  patterns 
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Figure  5:  Software  Library  Performance 


in  ^  iterations  for  z  =  1,5,10,20,50.  The  re¬ 
sulting  graph  shows  that  no  single  set  size  is  op¬ 
timal  for  all  scenarios.  In  general  it  is  better  to 

do  many  iterations  over  small  sets  or  one  itera¬ 
tion  over  large  set  sizes.  A  few  iterations  over 

medium-sized  sets  is  rarely  preferable.  Note  that 
for  virtually  all  conditions,  there  is  some  combina¬ 
tion  that  is  preferable  to  the  Aho-Corasick’s  best 

case  performance  of  1.  Choosing  the  proper  set 
size,  however  is  problematic. 

5  Experimental  Performance 

The  complexity  of  implementing  the  Boyer-Moore  al¬ 
gorithm  has  been  cited  as  one  of  the  reasons  why 
it  has  not  been  used  more.  We  therefore  devel¬ 
oped  a  general  purpose  library  for  matching  a  set 
of  strings  using  one  of  several  algorithms.  This  library 
is  written  in  C  and  will  be  freely  distributed  for 
use  by  other  projects.  The  library  currently  im¬ 
plements  Boyer-Moore,  Boyer-Moore-Horspool,  Aho- 
Corasick,  and  our  SBMH  algorithm.  For  more  even  com¬ 
parison,  the  implementations  of  both  Aho-Corasick 
and  our  SBMH  algorithm  use  the  same  trie  data 
structure  implementation. 

Over  the  course  of  our  tests  we  repeatedly  found 
that  subtle  optimizations  to  the  performance  of  an 
algorithm’s  implementation  can  have  dramatic  effects  on 
the  relative  performance  of  algorithms.  Thus  any 
performance  comparison  of  these  algorithms  should 
not  overlook  the  quality  of  the  implementation.  We 
hope  that  other  groups  will  make  use  of  (and  un¬ 
doubtedly  improve)  our  library  in  order  to  ensure 
meaningful  performance  comparisons  in  the  future. 

Case  Insensitivity:  Roughly  half  of  the  strings  in  the 
Snort  ruleset  are  case  insensitive.  Given  the  large 
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Figure  6:  Software  Library  Performance 


number  of  Internet  applications  that  use  case  insensitive, 
ASCII  text-based  protocols,  this  is  not  surprising. 
Our  string  matching  library  has  functions  special¬ 
ized  for  case  sensitive  and  insensitive  searches  us¬ 
ing  Boyer-Moore  and  Boyer-Moore-Horspool.  Our 
implementations  of  Aho-Corasick  and  Setwise  Boyer- 
Moore-Horspool  are  always  insensitive  to  case.  This 
is  done  at  precomputation  time  by  making  redun¬ 
dant  entries  for  both  upper  and  lower  case  in  the 
tables  used  for  trie  branches  and  the  Boyer-Moore 
bad  character  heuristic.  At  run  time,  if  there  is 
a  case-insensitive  match,  but  the  pattern  is  case- 
sensitive,  then  we  double-check  the  byte-strings  to 
ensure  an  exact,  case-sensitive  match.  Since  matches  are 
relatively  infrequent,  this  extra  cost  is  only  occa¬ 
sionally  incurred. 


5.1  Library  Benchmarks 

To  compare  the  actual  running  time  of  the  four 
algorithms,  we  pick  sets  of  words  from  an  English 
dictionary  and  search  for  them  in  randomly  generated 
binary  strings.  Figure  5  shows  a  log-scale  repre¬ 
sentation  of  the  time  required  for  each  of  the  al¬ 
gorithms  to  perform  case  sensitive  searches  through 
sets  of  sizes  1  through  500.  Times  are  measured 
using  Pentium  hardware  counters  accumulated  over 
1000  runs.  The  two  variants  of  the  standard  Boyer- 
Moore  algorithm  perform  similarly  and  for  most  set 
sizes  are  much  worse  than  both  of  the  set-search 
algorithms.  Figure  6  is  a  more  detailed  plot  of 
only  the  set  algorithms. 

There  are  several  interesting  points  regarding  the  rel¬ 
ative  performance  of  Aho-Corasick  and  our  SBMH 
algorithm.  First,  we  see  that  for  sets  with  fewer 
than  roughly  100  members,  the  character  skipping 
heuristics  of  SBMH  allow  it  to  perform  significantly  bet¬ 
ter  than  Aho-Corasick.  As  described  in  our  theo¬ 
retical  analysis,  the  advantage  of  the  bad  character 
heuristic  degrades  as  the  number  of  overall  charac¬ 
ters  in  the  pattern  set  increases.  As  this  num¬ 
ber  rises,  so  does  the  probability  that  a  character 
will  occur  near  the  end  of  a  pattern.  Thus,  the 
performance  of  SBMH  degrades  until,  at  set  sizes 
of  around  100  elements,  it  begins  performing  worse 
than  Aho-Corasick. 

A  second  point  of  interest  is  that  Aho-Corasick, 
which  has  proven  linear  performance,  also  experi¬ 
ences  a  noticeable  (but  constant)  performance  degra¬ 
dation  for  sets  of  roughly  100  or  more  elements. 
We  believe  that  this  crossover  is  due  to  capacity 
misses  in  the  cache.  The  average  length  of  the 
patterns  in  this  experiment  is  7.26  characters.  For 
the  set-wise  algorithms,  the  size  of  the  data  struc¬ 
tures  used  by  the  library  is  roughly  260  bytes  per 
character.  Out  test  program  would  therefore  re¬ 
quire  roughly  139  words  to  fill  the  256KB  on-chip  cache. 

5.2  Integration  with  Snort 

The  following  measurements  were  made  on  a  733MHz 
Pentium  HI  (Coppermine)  system  with  256KB  of 
on-chip  cache  and  running  Linux  2.2.18.  We  used 
Snort  version  1.6.3-p2  modified  to  use  our  string 
matching  library.  The  Snort  ruleset  used  was  the 
full  ruleset  from  the  Snort  1.6.3-p2  distribution  with 
the  “ping”  and  “backdoor”  rules  disabled  (a  com¬ 
mon  configuration).  Timing  information  was  ac¬ 
quired  with  the  gprof  [12]  execution  profiler. 

Snort  uses  libpcap  [18]  for  capturing  and  filter¬ 
ing  packets.  As  a  result.  Snort  can  read  packet 
traces  stored  by  programs  like  tcpdump.  We  used 
this  capability  to  replay  fixed  traces  with  determin¬ 


istic  results.  Operating  on  live  traffic  would  make 
any  results  inconclusive  due  to  potential  variations 
in  traffic  between  tests. 

The  pcap  library  uses  a  callback  to  run  the  appli¬ 
cation’s  packet  handler  on  each  packet.  The  timing  mea¬ 
surements  shown  below  exclude  the  costs  of  pcap 
itself  since  our  experiments  read  packets  from  disk 
rather  the  network.  Optimizing  the  performance 
of  lihpcap  is  an  area  of  our  current  research.  In 
particular,  the  use  of  ring  buffers  shared  by  the 
kernel  and  the  application  improve  performance  sig¬ 
nificantly  by  removing  system  call  overhead. 

Decision  Structure:  Snort  rules  are  categorized  by 
which  protocol  (TCP,  UDP,  or  ICMP)  they  apply  to. 
Since  any  given  rule  may  only  apply  to  one  of  these 
protocols,  they  form  mutually  exclusive  sets  of  rules. 
Rules  are  then  grouped  by  their  action  Pass,  Alert,  and 
Log,  with  each  group  being  processed  separately  in  a 
configurable  order. 

In  order  to  minimize  the  amount  of  time  spent 
performing  expensive  string  matching.  Snort  searches 
for  strings  only  after  comparing  header  attributes. 
For  each  group  (Pass,  Alert,  Log)  of  each  proto¬ 
col  (TCP,  UDP,  or  ICMP)  rules  are  stored  in  an 
unordered,  two-dimension  linked  list.  The  first  di¬ 
mension  contains  a  list  of  the  different  filters  used 
by  rules.  We  use  the  term  filters  to  refer  to  tra¬ 
ditional  packet  classification  criteria  such  as  IP  ad¬ 
dresses  and  TCP  or  UDP  ports.  Each  filter  on 
this  first  list  contains  a  list  of  every  rule  that  is 
predicated  on  that  filter.  Each  of  these  rules  may 
then  define  other  criteria  such  as  TCP  flags  and 
payload  contents. 

We  chose  to  modify  this  structure  as  little  as 
possible  in  order  to  make  it  a  controlled  variable 
in  our  experiments.  Thus  we  employed  the  tech¬ 
nique  of  lazily  performing  a  set-wise  string  search 
the  first  time  any  element  of  that  set  is  tested. 
However,  we  cache  the  results  of  this  set  search  so 
that  subsequent  tests  of  other  elements  of  the  set 
need  only  check  to  see  whether  or  not  that  ele¬ 
ment  was  flagged  as  found  during  the  set  search. 

5.3  Average  Traffic 

We  ran  our  modified  Snort  on  a  dataset  of  the 
full  contents  of  8.7  million  packets  totalling  more 
than  1  gigabyte  captured  over  a  25  minute  period 
from  the  Internet  connection  of  a  large  enterprise 
with  mixed  business  and  scientific  users.  We  con¬ 
ducted  several  experiments  involving  different  ways 
of  constructing  sets. 

First,  we  built  a  small  number  of  sets  based  on 
Snort’s  initial  decision  criterion,  the  protocol.  Thus  we 
build  3  sets  of  strings  for  TCP,  UDP,  and  ICMP 
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Method 

Min  Set  Size 

Total  Time 

(seconds) 

Set  Search 

(seconds) 

Singleton  Search 

(seconds) 

Boyer-Moore 

— 

136.16 

0 

136.16 

Horspool 

— 

102.83 

0 

102.83 

Aho-Corasick 

1 

132.73 

132.73 

0 

SBMH 

1 

104.65 

104.65 

0 

Aho-Corasick 

2 

93.95 

84.36 

9.59 

SBMH 

2 

91.66 

82.60 

9.06 

Aho-Corasick 

3 

101.21 

87.78 

10.69 

SBMH 

3 

96.42 

83.69 

9.96 

SBMH  /  Aho-Corasick 

2 

89.76 

Figure  8:  Search  times  with  one  set  for  each  filter 


Method 

Min  Set  Size 

Total  Time 

(seconds) 

Set  Search 

(seconds) 

Singleton  Search 

(seconds) 

Boyer-Moore 

— 

21.25 

0 

21.25 

Horspool 

— 

22.47 

0 

22.47 

Aho-Corasick 

2 

4.76 

3.19 

0 

Aho-Corasick 

2 

4.36 

3.22 

0 

Aho-Corasick 

2 

4.74 

3.50 

0 

SBMH 

2 

4.63 

3.66 

0 

SBMH 

2 

4.98 

3.97 

0 

SBMH 

2 

4.73 

3.51 

0 

Figure  9:  Search  times  for  worst-case  traffic 


Method 

Total  Search  Time 

(seconds) 

Boyer-Moore 

136.16 

Horspool 

102.83 

Aho-Corasick 

235.76 

Setwise  Horspool 

1246.83 

Figure  7:  Search  time  with  all-encompassing  sets 

with  sizes  490,  60,  and  9,  respectively.  The  break¬ 
down  of  traffic  in  this  packet  trace  as  98.199%  TCP, 
1.499%  UDP,  and  0.257%  ICMP.  Figure  7  compares 
the  total  amount  of  time  spent  performing  string 
searches  using  Aho-Corasick  and  our  Setwise  Hor- 
spool  algorithm  to  the  time  required  with  Snort’s 
traditional  method  of  running  Boyer-Moore  separately 
for  each  rule  being  checked.  Overall  performance 
actually  suffers  if  setwise  algorithms  are  employed 
in  this  manner.  The  first-order  conclusion  is  that 
set-wise  string  searching  must  be  carefully  applied 
to  this  problem.  The  results  of  library  tests  pre¬ 
sented  in  section  5.1  show  that  very  large  sets  can 
be  much  more  expensive.  Thus,  we  examine  the 
tradeoff  of  having  larger  numbers  of  smaller  sets. 


For  each  monitored  packet,  the  optimal  solution 
is  to  have  a  set  that  contains  every  pattern  string 
for  every  rule  that  matches  that  packet,  but  no 
additional  strings.  Unfortunately,  the  number  of 
permutations  of  overlapping  rules  makes  this  pro¬ 
hibitively  expensive.  Thus,  we  employ  heuristics 
to  approach  this  optimum. 

Given  Snort’s  decision  structure  described  in  sec¬ 
tion  5.2,  it  is  natural  to  build  one  set  per  filter. 
There  are  two  limitations  to  this  approach.  First 
the  additional  criteria  expressed  by  each  rule  may 
mean  that  only  a  small  portion  of  the  set  gets 
used  for  any  particular  packet.  Second,  a  packet 
may  match  multiple  filters  requiring  multiple  sets 
to  be  searched.  The  result  of  this  structuring  is 
that  strings  are  now  partitioned  in  64  sets  instead  of  3. 

For  singleton  sets,  the  set-wise  algorithms  have 
a  higher  overhead  and  are  slower  compared  to  the  tra¬ 
ditional  Boyer-Moore  and  Boyer-Moore-Horspool  al¬ 
gorithms.  Of  the  64  sets,  24  of  them  are  single- 
tons.  Thus,  we  create  a  set  size  threshold  under 
which  the  traditional  algorithms  are  always  used. 

Figure  8  shows  the  results  of  this  set  of  exper¬ 
iments.  The  first  two  entries,  the  single  pattern 
search  algorithms,  are  the  same  as  before.  The 
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next  two  entries  show  the  performance  of  the  set¬ 
wise  algorithms  when  there  is  one  set  for  each  fil¬ 
ter  (a  total  of  64  sets).  Performance  is  better 
than  in  Figure  7,  but  not  significantly  better  than 
Snort  currently  is. 

For  the  fifth  and  sixth  entries,  the  threshold  for 
using  the  set-wise  algorithm  has  been  raised  to  two 
so  that  all  singleton  sets  will  be  processed  with 
the  Boyer-Moore-Horspool  algorithm  rather  than  a 
set-wise  algorithm.  The  right-hand  column  shows 
that  for  the  setwise  algorithms,  the  vast  majority 
(roughly  90%)  of  the  time  is  still  spent  perform¬ 
ing  setwise  searches.  Yet,  the  overall  performance 
is  now  as  much  as  1.5  times  faster  than  Snort’s 
current  algorithm,  Boyer-Moore. 

The  next  two  entries  show  the  effect  of  raising 
the  threshold  for  using  the  set-wise  algorithms  to 
three.  This  proves  to  be  counter-productive  since 
it  results  in  more  searches  per  packet  by  splitting 
sets  of  two  into  two  separate  searches.  Thus,  the 
best  performance  is  achieved  by  making  one  set  for 
each  filter  and  reserving  the  use  of  the  set-wise  algo¬ 
rithms  to  sets  with  more  than  one  element.  The 
result  of  using  this  technique  rather  than  Snort’s 
current  algorithm  is  a  speed-up  of  1.49.  Note  that  in 
all  cases  studied  in  this  series  of  experiments,  our 
SBMH  algorithm  performed  marginally  better  than 
Aho-Corasick. 

Section  5.1  demonstrated  that  in  practice,  there 
is  cross-over  point  in  set  size  where  our  SBMH  al¬ 
gorithm  performs  best  below  this  point,  but  be¬ 
yond  this  point,  Aho-Corasick  performs  better.  We 
therefore  modify  our  set  construction  algorithm  in 
Snort  to  handle  three  different  classes  of  sets.  For 
singleton  sets  we  use  Boyer-Moore-Horspool  which 
has  the  best  performance  for  single  strings.  For 
sets  of  size  two  through  100,  we  use  our  SBMH 
algorithm,  and  for  sets  of  more  than  100,  we  use 
Aho-Corasick.  The  results  of  this  technique  are 
shown  in  the  final  entry  of  Figure  8.  The  speed¬ 
up  over  Snort’s  current  algorithm  increases  to  1.52. 

We  note  that  the  threshold  of  100  is  dependent  upon 
the  implementation  and  platform  being  used.  However, 
the  sets  in  our  experiments  skip  from  a  size  of  31  to  310. 
Thus,  we  have  been  unable  to  determine  the  optimum 
threshold  for  our  implementation. 

5.4  Worst-case  Traffic 

In  terms  of  effect  on  load,  the  worst-case  traffic 
is  that  which  requires  many  string  searches.  The 
largest  single  component  of  commonly  distributed  Snort 
rulesets  is  rules  that  search  web  requests  for  well- 
known  implementation  vulnerabilities  such  as  CGI 
scripts.  Our  ruleset  contains  310  such  strings  that 


apply  to  TCP  port  80  packets.  Packets  that  match 
this  filter  are  the  the  worst-case  load  for  a  detec¬ 
tion  system  which  must  check  each  packet  against 
all  310  rules.  However,  it  is  this  sort  of  work¬ 
load  that  best  exposes  the  benefits  of  the  setwise 
string  matching  algorithms. 

We  filtered  our  original  dataset  to  extract  only 
the  packets  that  match  these  filters.  The  result  is 
a  dataset  of  13084  such  web  request  packets.^  It 
is  worth  noting  that  such  traffic  may  in  fact  dom¬ 
inate  the  workload  of  many  e-commerce  networks’ 
incoming  packets.  In  addition,  an  adept  attacker 
wishing  to  overload  a  detection  system  would  want 
to  generate  this  sort  of  hard  to  handle  traffic,  so 
it  is  important  that  Snort  can  process  it  efficiently. 

For  the  next  experiment,  we  tested  the  perfor¬ 
mance  of  Aho-Corasick  and  SBMH  on  this  worst- 
case  traffic.^  The  results  are  shown  in  Figure  9. 
Because  of  the  closeness  of  some  of  the  figures,  we 
present  results  from  three  different  runs  for  the  set¬ 
wise  algorithms.  The  improvement  over  Snort’s  cur¬ 
rent  Boyer-Moore  algorithm  is  much  more  striking 
for  this  experiment.  Switching  from  the  currently 
used  algorithm  (Boyer-Moore)  to  our  SBMH  algo¬ 
rithm  (with  an  average  search  time  of  4.78  seconds) 
results  in  an  average  speedup  factor  of  4.6.  Be¬ 
cause  this  experiment  is  dominated  by  a  very  large 
set  (310  members),  it  is  not  surprising  that  Aho- 
Corasick  performs  slightly  better  with  an  average 
search  time  of  4.62  seconds.  For  this  experiment, 
the  difference  between  SBMH  and  Aho-Corasick  is 
inconclusively  small. 

5.5  Performance  Summary 

As  more  and  more  vulnerabilities  are  discovered  for 
a  protocol  or  application,  the  number  of  rules  that 
share  a  common  filter  for  that  protocol  or  appli¬ 
cation  will  grow.  For  example,  the  web  already 
accounts  for  310  nearly  identical  rules  that  apply 
to  HTTP  traffic.  We  can  only  expect  that  this 
number  will  continue  to  grow,  as  will  the  number 
of  rules  that  apply  to  other  existing  applications. 

With  regard  to  this  problem,  setwise  algorithms 
scale  much  better  than  algorithms  that  search  each 
rule  individually.  The  results  presented  in  the  pre¬ 
vious  section  deal  with  packets  that  involve  fewer 
rules  on  average  than  this  benchmark.  The  in- 

^Most  of  these  rules  only  look  at  TCP  packets  with  the  push  bit  set. 
While  most  legitimate  web  requests  will  end  in  a  packet  with  the  push  bit, 
a  wily  attacker  would  certainly  choose  to  hide  their  attacks  by  ensuring 
that  the  push  bit  is  not  set.  To  catch  such  attacks,  string  searches  would 
have  to  be  run  on  a  larger  fraction  of  packets,  thus  increasing  the  relevance  of 
this  benchmark  even  further. 

^The  tri-modal  algorithm  presented  at  the  end  of  the  previous  section 
would  use  Aho-Corasick  in  this  case  since  the  data  only  exercises  a  single, 
large  set 
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creased  speedup  between  the  two  benchmarks,  4.6 
compared  to  1.5,  demonstrates  that  the  setwise  al¬ 
gorithms  scale  better  as  the  number  of  rules  per 
filter  grows. 

Figures  5  and  6  demonstrate  this  difference  in 
scalability.  While  the  SBMH  algorithm  that  we 
present  in  this  paper  performs  slightly  worse  than 
Aho-Corasick  for  very  large  sets,  SBMH  performs 
better  for  sets  with  less  than  100  members.  Fi¬ 
nally,  our  best  construction  adaptively  makes  use 
of  Boyer-Moore-Horspool,  Aho-Corasick,  and  SBMH 
to  pick  the  best  of  these  algorithms  for  each  set. 

We  presented  the  idea  of  an  algorithmic  perfor¬ 
mance  attack  in  Section  2.3.  A  load  of  incoming 
web  requests  would  be  an  attractive  way  to  per¬ 
form  such  an  attack  on  most  Snort  systems.  How¬ 
ever,  by  improving  our  performance  for  these  rules 
by  a  factor  of  4.6,  we  have  made  it  that  much 
harder  for  such  an  attack  to  succeed. 

6  Conclusions 

We  have  described  a  broad  class  of  content-based 
packet  handling  applications  (such  as  intrusion  de¬ 
tection,  Layer  7  switching,  and  virus  detection)  and 
their  dependence  on  searching  for  sets  of  strings  in 
packet  data.  Thus,  while  our  paper  focuses  on 
network  based  intrusion  detection  in  order  to  allow 
concrete  evaluation,  our  library  routines  and  new 
Setwise  Boyer-Moore-Horspool  algorithm  should  be 
useful  in  all  these  applications.  Further,  the  SBMH  algo¬ 
rithm  is  of  interest  to  applications  in  other  prob¬ 
lem  domains  that  currently  uses  Aho-Corasick  for 
somewhat  small  sets. 

Much  of  the  paper  focused  on  signature  based  sys¬ 
tems  and  on  improving  the  performance  of  a  pop¬ 
ular  open-source  signature  detection  system  called 
Snort.  On  packet  traces,  we  found  that  replacing 
the  Snort  Boyer-Moore  algorithm  with  a  set  algo¬ 
rithm  (either  Aho-Corasick  or  our  new  algorithm) 
improved  performance  only  slightly  unless  additional 
measures  were  taken  to  intelligently  build  the  pat¬ 
tern  sets.  After  this  additional  structuring  of  the 
problem  we  were  able  to  improve  performance  on 
average  by  50%  and  related  work  to  improve  the 
performance  of  the  search  algorithms  themselves  should 
improve  this  figure  further.  More  impressively,  we  found 
that  for  packet  traces  of  web  traffic,  which  dom¬ 
inates  many  networks,  the  set  algorithms  improve 
the  performance  of  standard  Snort  by  a  factor  of 
5.  This  large  improvement  weakens  an  attacker’s 
ability  to  overload  the  system  by  generating  traf¬ 
fic  that  much  be  checked  against  a  large  number 
of  filters.  We  also  found  that  our  setwise  Boyer- 


Moore-Horspool  outperforms  Aho-Corasick  for  sizes 
up  to  say  100,  upon  which  the  Aho-Corasick  al¬ 
gorithm  does  better.  A  combined  algorithm  that 
switches  from  standard  Boyer-Moore-Horspool  (for  sets 
of  size  1)  to  Setwise  Boyer-Moore-Horspool  (for  sets 
of  sizes  between  2  and  100)  and  finally  to  Aho- 
Corasick  (for  sizes  >  100)  seems  to  outperform  all 
other  variants. 

We  found  that  there  were  a  number  of  implemen¬ 
tation  artifacts  that  suggest  that  optimizing  these 
algorithms  for  the  particular  architecture  and  cache 
size  could  provide  further  improvements.  Even  our 
choice  of  thresholds  to  switch  between  algorithms 
is  likely  to  be  machine  dependent.  Also,  once  the  string 
matching  component  is  no  longer  a  bottleneck,  it 
may  pay  to  improve  the  performance  of  other  as¬ 
pects  of  Snort.  We  leave  all  these  aspects  to  future  work. 
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fers  substantially  from  our  approach.  For  instance, 
they  modified  the  semantics  of  the  rule  definitions 
to  fit  set  search  algorithms.  We  believe  that  our 
structure  and  result  caching  technique  do  not  per¬ 
form  significantly  worse  while  maintaining  the  se¬ 
mantics  and  ordering  or  rules  —  an  important  is¬ 
sue  to  the  authors  of  these  rulesets. 
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